Libraries


In [1]:
%%capture

!pip install python-google-places
!pip install langdetect
!pip install bnlp_toolkit
!wget https://www.omicronlab.com/download/fonts/kalpurush.ttf
!wget https://www.omicronlab.com/download/fonts/Siyamrupali.ttf
!pip install folium
!pip install geopandas
In [2]:
from googleplaces import GooglePlaces, types, lang
import time
import pandas as pd

from IPython.display import Markdown, display

import seaborn as sns
import matplotlib.pyplot as plt
from plotly.subplots import make_subplots
from wordcloud import WordCloud
import re

def printmd(string):
  display(Markdown(string))

from langdetect import detect
import unicodedata
import html

import folium
# Import folium MarkerCluster plugin
from folium.plugins import MarkerCluster
# Import folium MousePosition plugin
from folium.plugins import MousePosition
# Import folium DivIcon plugin
from folium.features import DivIcon

Location Dataset


This dataset contains the list of Upazilla/Thana for Different Districts of Bangladesh.

Credit : Mobile network coverage in Bangladeshi Upazila or Thana - kaggle

In [ ]:
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download -d mushfiqurrobin/network-coverage

!mkdir network-coverage
!unzip network-coverage.zip -d network-coverage
Downloading network-coverage.zip to /content
  0% 0.00/369k [00:00<?, ?B/s]
100% 369k/369k [00:00<00:00, 42.9MB/s]
Archive:  network-coverage.zip
  inflating: network-coverage/Coverage.csv  
In [ ]:
df = pd.read_csv("/content/network-coverage/Coverage.csv")

df_area = df[['Upazila_or_Thana', 'District']]

# Checking For Missing Values
total = df_area.isnull().sum().sort_values(ascending=False)
percent = (df_area.isnull().sum()/df_area.isnull().count()).sort_values(ascending=False)
missing_data = pd.concat([total, percent*100], axis=1, keys=['Total', 'Percent'])
display(missing_data.head(5))
Total Percent
District 0 0.0
Upazila_or_Thana 0 0.0
In [ ]:
# Checking for Duplicate Rows 
df_area.duplicated().sum()
Out[ ]:
21800
In [ ]:
# Dropping Duplicates
df_area.drop_duplicates(keep="first", inplace=True)
df_area.reset_index(drop=True, inplace=True)

df_area.to_csv("locations.csv", index=False)

Initialization


Here, I am combining the Upazilla/Thana and its dedicated District into a string and storing them into locations list. Later I will use this list for the searching query.

I have also intialized the searching radius to 2000 Meter or 2 KM.

Finally, I will store the Restaurants' information into restaurant_data

In [ ]:
API_KEY = "YOUR API KEY"

google_places = GooglePlaces(API_KEY)

restaurant_data = []
radius = 2000

# Converting the list of Upazilla/Thana and District into a combined string
locations = []
list_areas = df_area.values.tolist()

for area in list_areas:
  location_name = ', '.join([str(item) for item in area])
  locations.append(location_name)

print(locations)
['Kawkhali, Pirojpur', 'Mathbaria, Pirojpur', 'Nazirpur, Pirojpur', 'Nesarabad, Pirojpur', 'Pirojpur Sadar, Pirojpur', 'Zianagar, Pirojpur', 'Akhaura, Brahmanbaria', 'Ashuganj, Brahmanbaria', 'Brahmanbaria Sadar, Brahmanbaria', 'Bancharampur, Brahmanbaria', 'Bijoynagar, Brahmanbaria', 'Kasba, Brahmanbaria', 'Nabinagar, Brahmanbaria', 'Nasirnagar, Brahmanbaria', 'Sarail, Brahmanbaria', 'Alikadam, Bandarban', 'Bandarban Sadar, Bandarban', 'Lama, Bandarban', 'Naikhongchhari, Bandarban', 'Rowangchari, Bandarban', 'Ruma, Bandarban', 'Thanchi, Bandarban', 'Chandpur Sadar, Chandpur', 'Faridganj, Chandpur', 'Haimchar, Chandpur', 'Hajiganj, Chandpur', 'Kachua, Chandpur', 'Matlab (Dakshin), Chandpur', 'Matlab (Uttar), Chandpur', 'Shahrasti, Chandpur', 'Anwara, Chittagong', 'Banskhali, Chittagong', 'Boalkhali, Chittagong', 'Chandanish, Chittagong', 'Fatikchari, Chittagong', 'Karnaphuli, Chittagong', 'Lohagara, Chittagong', 'Mirsharai, Chittagong', 'Patiya, Chittagong', 'Rangunia, Chittagong', 'Raozan, Chittagong', 'Sandwip, Chittagong', 'Satkania, Chittagong', 'Sitakunda, Chittagong', "Chakoria, Cox'S Bazar", "Cox's Bazar Sadar, Cox'S Bazar", "Kutubdia, Cox'S Bazar", "Moheshkhali, Cox'S Bazar", "Pekua, Cox'S Bazar", "Ramu, Cox'S Bazar", "Teknaf, Cox'S Bazar", "Ukhiya, Cox'S Bazar", 'Barura, Cumilla', 'Brahmanpara, Cumilla', 'Burichong, Cumilla', 'Chandina, Cumilla', 'Chauddagram, Cumilla', 'Cumilla Sadar, Cumilla', 'Cumilla Sadar Daksin, Cumilla', 'Daudkandi, Cumilla', 'Debidwar, Cumilla', 'Homna, Cumilla', 'Laksham, Cumilla', 'Lalmai, Cumilla', 'Meghna, Cumilla', 'Monohorganj, Cumilla', 'Muradnagar, Cumilla', 'Nangalkot, Cumilla', 'Titas, Cumilla', 'Chhagalniya, Feni', 'Daganbhuiyan, Feni', 'Amtali, Barguna', 'Bamna, Barguna', 'Barguna Sadar, Barguna', 'Betagi, Barguna', 'Patharghata, Barguna', 'Taltali, Barguna', 'Agailjhara, Barishal', 'Babuganj, Barishal', 'Bakerganj, Barishal', 'Banaripara, Barishal', 'Barishal Sadar, Barishal', 'Hizla, Barishal', 'Mehendiganj, Barishal', 'Muladi, Barishal', 'Uzirpur, Barishal', 'Bhola-S, Bhola', 'Borhanuddin, Bhola', 'Charfassion, Bhola', 'Daulatkhan, Bhola', 'Lalmohan, Bhola', 'Monpura, Bhola', 'Tazumuddin, Bhola', 'Jhalokathi Sadar, Jhalokathi', 'Kathalia, Jhalokathi', 'Nalchity, Jhalokathi', 'Rajapur, Jhalokathi', 'Bauphal, Patuakhali', 'Dashmina, Patuakhali', 'Dumki, Patuakhali', 'Galachipa, Patuakhali', 'Kalapara, Patuakhali', 'Mirzaganj, Patuakhali', 'Patuakhali Sadar, Patuakhali', 'Rangabali, Patuakhali', 'Bhandaria, Pirojpur', 'Hathazari, Chittagong', 'Feni Sadar, Feni', 'Fulgazi, Feni', 'Porshuram, Feni', 'Sonagazi, Feni', 'Dighinala, Khagrachari', 'Guimara, Khagrachari', 'Khagrachari Sadar, Khagrachari', 'Laxmichari, Khagrachari', 'Mahalchari, Khagrachari', 'Manikchari, Khagrachari', 'Matiranga, Khagrachari', 'Panchari, Khagrachari', 'Ramgarh, Khagrachari', 'Komol Nagar,  Laxmipur', ' Laxmipur Sadar,  Laxmipur', 'Raipur,  Laxmipur', 'Ramganj,  Laxmipur', 'Ramgati,  Laxmipur', 'Begumganj, Noakhali', 'Chatkhil, Noakhali', 'Companiganj, Noakhali', 'Hatiya, Noakhali', 'Kabir Hat, Noakhali', 'Noakhali Sadar, Noakhali', 'Senbag, Noakhali', 'Sonaimuri, Noakhali', 'Subarna Char, Noakhali', 'Baghaichari, Rangamati', 'Barkal, Rangamati', 'Belaichari, Rangamati', 'Juraichari, Rangamati', 'Kaptai, Rangamati', 'Kaukhali, Rangamati', 'Langadu, Rangamati', 'Nanniarchar, Rangamati', 'Rajosthali, Rangamati', 'Rangamati Sadar, Rangamati', 'Dhamrai, Dhaka', 'Dohar, Dhaka', 'Keraniganj, Dhaka', 'Nawabganj, Dhaka', 'Savar, Dhaka', 'Alfadanga, Faridpur', 'Bhanga, Faridpur', 'Boalmari, Faridpur', 'Charbhadrasan, Faridpur', 'Faridpur Sadar, Faridpur', 'Madhukhali, Faridpur', 'Nagarkanda, Faridpur', 'Sadarpur, Faridpur', 'Saltha, Faridpur', 'Gazipur Sadar, Gazipur', 'Kaliakoir, Gazipur', 'Kaligonj, Gazipur', 'Kapasia, Gazipur', 'Sreepur, Gazipur', 'Gopalganj Sadar, Gopalganj', 'Kasiani, Gopalganj', 'Kotalipara, Gopalganj', 'Muksudpur, Gopalganj', 'Tungipara, Gopalganj', 'Austagram, Kishoreganj', 'Bajitpur, Kishoreganj', 'Bhairab, Kishoreganj', 'Hossainpur, Kishoreganj', 'Itna, Kishoreganj', 'Karimganj, Kishoreganj', 'Katiadi, Kishoreganj', 'Kishoreganj Sadar, Kishoreganj', 'Kuliarchar, Kishoreganj', 'Mithamoin, Kishoreganj', 'Nikli, Kishoreganj', 'Pakundia, Kishoreganj', 'Tarail, Kishoreganj', 'Kalkini, Madaripur', 'Madaripur Sadar, Madaripur', 'Rajoir, Madaripur', 'Shibchar, Madaripur', 'Daulatpur, Manikganj', 'Ghior, Manikganj', 'Harirampur, Manikganj', 'Manikganj Sadar, Manikganj', 'Saturia, Manikganj', 'Shivalaya, Manikganj', 'Singair, Munshiganj', 'Gazaria, Munshiganj', 'Lauhajong, Munshiganj', 'Munshiganj Sadar, Munshiganj', 'Sirajdikhan, Munshiganj', 'Sreenagar, Munshiganj', 'Tongibari, Munshiganj', 'Araihazar, Narayanganj', 'Bandar, Narayanganj', 'Narayanganj Sadar, Narayanganj', 'Rupganj, Narayanganj', 'Sonargaon, Narayanganj', 'Belabo, Narshingdi', 'Monohardi, Narshingdi', 'Narshingdi Sadar, Narshingdi', 'Palash, Narshingdi', 'Raipura, Narshingdi', 'Shibpur, Narshingdi', 'Baliakandi, Rajbari', 'Goalanda, Rajbari', 'Kalukhali, Rajbari', 'Pangsha, Rajbari', 'Rajbari Sadar, Rajbari', 'Bhedarganj, Shariatpur', 'Damuddya, Shariatpur', 'Goshairhat, Shariatpur', 'Janjira, Shariatpur', 'Naria, Shariatpur', 'Shariatpur Sadar, Shariatpur', 'Basail, Tangail', 'Bhuapur, Tangail', 'Delduar, Tangail', 'Dhanbari, Tangail', 'Ghatail, Tangail', 'Gopalpur, Tangail', 'Kalihati, Tangail', 'Madhupur, Tangail', 'Mirzapur, Tangail', 'Nagarpur, Tangail', 'Shakhipur, Tangail', 'Tangail Sadar, Tangail', 'Bagerhat Sadar, Bagerhat', 'Chitalmari, Bagerhat', 'Fakirhat, Bagerhat', 'Kachua, Bagerhat', 'Mollahat, Bagerhat', 'Mongla, Bagerhat', 'Morrelganj, Bagerhat', 'Rampal, Bagerhat', 'Sharankhola, Bagerhat', 'Alamdanga, Chuadanga', 'Chuadanga Sadar, Chuadanga', 'Damurhuda, Chuadanga', 'Jibannagar, Chuadanga', 'Abhoynagar, Jashore', 'Bagherpara, Jashore', 'Chowgacha, Jashore', 'Jashore Sadar, Jashore', 'Jhikargacha, Jashore', 'Keshabpur, Jashore', 'Monirampur, Jashore', 'Sarsha, Jashore', 'Harinakunda, Jhenaidah', 'Jhenaidah Sadar, Jhenaidah', 'Kaliganj, Jhenaidah', 'Kotchandpur, Jhenaidah', 'Moheshpur, Jhenaidah', 'Shailkupa, Jhenaidah', 'Batiaghata, Khulna', 'Dacope, Khulna', 'Dighalia, Khulna', 'Dumuria, Khulna', 'Koira, Khulna', 'Paikgacha, Khulna', 'Phultala, Khulna', 'Rupsa, Khulna', 'Terokhada, Khulna', 'Bheramara, Kushtia', 'Daulatpur, Kushtia', 'Khoksha, Kushtia', 'Kumarkhali, Kushtia', 'Kushtia Sadar, Kushtia', 'Mirpur, Kushtia', 'Magura Sadar, Magura', 'Mohammadpur, Magura', 'Salikha, Magura', 'Sreepur, Magura', 'Gangni, Meherpur', 'Meherpur Sadar, Meherpur', 'Mujibnagar, Meherpur', 'Kalia, Narail', 'Lohagara, Narail', 'Narail Sadar, Narail', 'Assasuni, Satkhira', 'Debhata, Satkhira', 'Kalaroa, Satkhira', 'Kaliganj, Satkhira', 'Satkhira Sadar, Satkhira', 'Shyamnagar, Satkhira', 'Tala, Satkhira', 'Bakshiganj, Jamalpur', 'Dewanganj, Jamalpur', 'Islampur, Jamalpur', 'Jamalpur Sadar, Jamalpur', 'Madarganj, Jamalpur', 'Melendah, Jamalpur', 'Sarishabari, Jamalpur', 'Bhaluka, Mymensingh', 'Dhobaura, Mymensingh', 'Fulbaria, Mymensingh', 'Gaffargaon, Mymensingh', 'Gouripur, Mymensingh', 'Haluaghat, Mymensingh', 'Ishwarganj, Mymensingh', 'Muktagacha, Mymensingh', 'Mymensingh Sadar, Mymensingh', 'Nandail, Mymensingh', 'Phulpur, Mymensingh', 'Tarakanda, Mymensingh', 'Trishal, Mymensingh', 'Atpara, Netrakona', 'Barhatta, Netrakona', 'Durgapur, Netrakona', 'Kalmakanda, Netrakona', 'Kendua, Netrakona', 'Khaliajuri, Netrakona', 'Madan, Netrakona', 'Mohanganj, Netrakona', 'Netrakona Sadar, Netrakona', 'Purbadhala, Netrakona', 'Jhenaigati, Sherpur', 'Nakla, Sherpur', 'Nalitabari, Sherpur', 'Sherpur Sadar, Sherpur', 'Sreebordi, Sherpur', 'Adamdighi, Bogura', 'Bogura Sadar, Bogura', 'Dhunot, Bogura', 'Dhupchancia, Bogura', 'Gabtali, Bogura', 'Kahaloo, Bogura', 'Nandigram, Bogura', 'Sariakandi, Bogura', 'Shajahanpur, Bogura', 'Sherpur, Bogura', 'Shibganj, Bogura', 'Sonatala, Bogura', 'Bholahat, Chapainawabganj', 'Gomostapur, Chapainawabganj', 'Nachol, Chapainawabganj', 'Nawabganj Sadar, Chapainawabganj', 'Shibganj, Chapainawabganj', 'Akkelpur, Joypurhat', 'Joypurhat Sadar, Joypurhat', 'Kalai, Joypurhat', 'Khetlal, Joypurhat', 'Panchbibi, Joypurhat', 'Atrai, Naogaon', 'Badalgachi, Naogaon', 'Dhamoirhat, Naogaon', 'Manda, Naogaon', 'Mohadevpur, Naogaon', 'Naogaon Sadar, Naogaon', 'Niamatpur, Naogaon', 'Patnitala, Naogaon', 'Porsha, Naogaon', 'Raninagar, Naogaon', 'Sapahar, Naogaon', 'Bagatipara, Natore', 'Baraigram, Natore', 'Gurudaspur, Natore', 'Lalpur, Natore', 'Naldanga, Natore', 'Natore Sadar, Natore', 'Singra, Natore', 'Atghoria, Pabna', 'Bera, Pabna', 'Bhangura, Pabna', 'Chatmohar, Pabna', 'Faridpur, Pabna', 'Ishwardi, Pabna', 'Pabna Sadar, Pabna', 'Santhia, Pabna', 'Sujanagar, Pabna', 'Bagha, Rajshahi', 'Bagmara, Rajshahi', 'Charghat, Rajshahi', 'Durgapur, Rajshahi', 'Godagari, Rajshahi', 'Mohanpur, Rajshahi', 'Paba, Rajshahi', 'Puthia, Rajshahi', 'Tanore, Rajshahi', 'Belkuchi, Sirajganj', 'Chowhali, Sirajganj', 'Kamarkhanda, Sirajganj', 'Kazipur, Sirajganj', 'Raiganj, Sirajganj', 'Shahzadpur, Sirajganj', 'Sirajganj Sadar, Sirajganj', 'Tarash, Sirajganj', 'Ullapara, Sirajganj', 'Birampur, Dinajpur', 'Birganj, Dinajpur', 'Birol, Dinajpur', 'Bochaganj, Dinajpur', 'Chirirbandar, Dinajpur', 'Dinajpur Sadar, Dinajpur', 'Fulbari, Dinajpur', 'Ghoraghat, Dinajpur', 'Hakimpur, Dinajpur', 'Kaharol, Dinajpur', 'Khanshama, Dinajpur', 'Nawabganj, Dinajpur', 'Parbatipur, Dinajpur', 'Fulchari, Gaibandha', 'Gaibandha Sadar, Gaibandha', 'Gobindaganj, Gaibandha', 'Palashbari, Gaibandha', 'Sadullapur, Gaibandha', 'Saghata, Gaibandha', 'Sundarganj, Gaibandha', 'Bhurungamari, Kurigram', 'Chilmari, Kurigram', 'Fulbari, Kurigram', 'Kurigram Sadar, Kurigram', 'Nageswari, Kurigram', 'Rajarhat, Kurigram', 'Char Rajibpur, Kurigram', 'Rowmari, Kurigram', 'Ulipur, Kurigram', 'Aditmari, Lalmonirhat', 'Hatibandha, Lalmonirhat', 'Kaliganj, Lalmonirhat', 'Lalmonirhat Sadar, Lalmonirhat', 'Patgram, Lalmonirhat', 'Dimla, Nilphamari', 'Domar, Nilphamari', 'Jaldhaka, Nilphamari', 'Kishoreganj, Nilphamari', 'Nilphamari Sadar, Nilphamari', 'Sayedpur, Nilphamari', 'Atwari, Panchagarh', 'Boda, Panchagarh', 'Debiganj, Panchagarh', 'Panchagarh Sadar, Panchagarh', 'Tetulia, Panchagarh', 'Badarganj, Rangpur', 'Gangachara, Rangpur', 'Kaunia, Rangpur', 'Mithapukur, Rangpur', 'Pirgacha, Rangpur', 'Pirganj, Rangpur', 'Rangpur Sadar, Rangpur', 'Taraganj, Rangpur', 'Baliadangi, Thakurgaon', 'Haripur, Thakurgaon', 'Pirganj, Thakurgaon', 'Ranisankail, Thakurgaon', 'Thakurgaon Sadar, Thakurgaon', 'Ajmiriganj, Habiganj', 'Bahubal, Habiganj', 'Baniachong, Habiganj', 'Chunarughat, Habiganj', 'Habiganj Sadar, Habiganj', 'Lakhai, Habiganj', 'Madhabpur, Habiganj', 'Nabiganj, Habiganj', 'Sayestaganj, Habiganj', 'Barlekha, Moulvibazar', 'Juri, Moulvibazar', 'Kamalganj, Moulvibazar', 'Kulaura, Moulvibazar', 'Moulvibazar Sadar, Moulvibazar', 'Rajnagar, Moulvibazar', 'Sreemangal, Moulvibazar', 'Biswamvarpur, Sunamganj', 'Chatak, Sunamganj', 'Dakshin Sunamganj, Sunamganj', 'Derai, Sunamganj', 'Dharmapasha, Sunamganj', 'Dowarabazar, Sunamganj', 'Jagannathpur, Sunamganj', 'Jamalganj, Sunamganj', 'Sulla, Sunamganj', 'Sunamganj Sadar, Sunamganj', 'Tahirpur, Sunamganj', 'Balaganj, Sylhet', 'Beanibazar, Sylhet', 'Biswanath, Sylhet', 'Companiganj, Sylhet', 'Dakshin Surma, Sylhet', 'Fenchuganj, Sylhet', 'Golapganj, Sylhet', 'Gowainghat, Sylhet', 'Jointiapur, Sylhet', 'Kanaighat, Sylhet', 'Osmaninagar, Sylhet', 'Sylhet Sadar, Sylhet', 'Zakiganj, Sylhet', 'Gournadi, Barishal', 'Doarabazar, Sunamganj', 'Adabar, Dhaka', 'Uttar Khan, Dhaka', 'Uttara, Dhaka', 'Kadamtali, Dhaka', 'Kalabagan, Dhaka', 'Kafrul, Dhaka', 'Kamrangirchar, Dhaka', 'Cantonment, Dhaka', 'Kotwali, Dhaka', 'Khilkhet, Dhaka', 'Khilgaon, Dhaka', 'Gulshan, Dhaka', 'Gandaria, Dhaka', 'Chawkbazar, Dhaka', 'Demra, Dhaka', 'Turag, Dhaka', 'Tejgaon, Dhaka', 'Tejgaon Industrial Area, Dhaka', 'Dakshinkhan, Dhaka', 'Darus Salam, Dhaka', 'Dhanmondi, Dhaka', 'New Market, Dhaka', 'Paltan, Dhaka', 'Pallabi, Dhaka', 'Bangshal, Dhaka', 'Badda, Dhaka', 'Bimanbandar, Dhaka', 'Motijheel, Dhaka', 'Mirpur, Dhaka', 'Mohammadpur, Dhaka', 'Jatrabari, Dhaka', 'Ramna, Dhaka', 'Rampura, Dhaka', 'Lalbagh, Dhaka', 'Shah Ali, Dhaka', 'Shahbagh, Dhaka', 'Sher-e-Bangla Nagar, Dhaka', 'Shyampur, Dhaka', 'Sabujbagh, Dhaka', 'Sutrapur, Dhaka', 'Hazaribagh, Dhaka', 'Banani, Dhaka', 'Bhashantek, Dhaka', 'Vatara, Dhaka', 'Mugda, Dhaka', 'Rupnagar, Dhaka', 'Wari, Dhaka', 'Bayejid Bostami, Chittagong', 'Bakalia, Chittagong', 'Chandgaon, Chittagong', 'Chittagong Bandar, Chittagong', 'Double Mooring, Chittagong', 'Halishahar, Chittagong', 'Kotwali, Chittagong', 'Khulshi, Chittagong', 'Pahartali, Chittagong', 'Panchlaish, Chittagong', 'Patenga, Chittagong', 'Daulatpur, Khulna', 'Khalishpur, Khulna', 'Khan Jahan Ali, Khulna', 'Kotwali, Khulna', 'Sonadanga, Khulna', 'Boalia, Rajshahi', 'Matihar, Rajshahi', 'Rajpara, Rajshahi', 'Shah Makhdum, Rajshahi']

Restaurant Scraper


In [ ]:
for location in locations:
  print("---------------------", location, "-----------------------")
  query_result = google_places.nearby_search(
          location=location, keyword='Restaurant',
          radius=radius) 

  if query_result:
  
    for place in query_result.places:
      place.get_details()

      place_id = place.details.get('place_id')
      name = place.name
      latitude = place.geo_location.get('lat')
      longitude = place.geo_location.get('lng')
      rating = place.rating
      number_of_reviews = place.details.get('user_ratings_total')
      affluence = place.details.get('price_level')
      address = place.formatted_address

      restaurant_data.append([place_id, name, latitude, longitude, rating, number_of_reviews, affluence, address])
      # print(place.details)

    # print(restaurant_data)
    print("--------------------- Scrapped Restaurants: ", len(restaurant_data))
    time.sleep(5) 

    while query_result.has_next_page_token:
        query_result = google_places.nearby_search(location=location, keyword='Restaurant',
            radius=radius, pagetoken=query_result.next_page_token)
        
        for place in query_result.places:
          place.get_details()

          place_id = place.details.get('place_id')
          name = place.name
          latitude = place.geo_location.get('lat')
          longitude = place.geo_location.get('lng')
          rating = place.rating
          number_of_reviews = place.details.get('user_ratings_total')
          affluence = place.details.get('price_level')
          address = place.formatted_address

          restaurant_data.append([place_id, name, latitude, longitude, rating, number_of_reviews, affluence, address])
          # print(place.details)
        # print(restaurant_data)  
        print("--------------------- Scrapped Restaurants: ", len(restaurant_data))
        time.sleep(5) 

  time.sleep(5)

# Dumping the data into a DataFrame
df_restaurant = pd.DataFrame(restaurant_data, columns=['place_id', 'name', 'latitude', 'longitude', 'rating', 'number_of_reviews', 'affluence', 'address'])

df_restaurant.to_csv("restaurants.csv", index=False, encoding='utf-8')

Data Preparation


In [ ]:
restaurant_df = pd.read_csv("/content/restaurants.csv", encoding='utf-8')

display(restaurant_df.duplicated().sum())
1945

There are 1945 Duplicate Data present in the dataframe.

In [ ]:
restaurant_df.drop_duplicates(keep="first", inplace=True)

Here, I kept the address of each restaurant to check whether they are in Bangladeh or Not. As we can see below, 62 restaurants are in India.

In [ ]:
res_not_bangladesh = restaurant_df[restaurant_df['address'].str.contains('Bangladesh')==False]
res_not_bangladesh
Out[ ]:
place_id name latitude longitude rating number_of_reviews affluence address
2249 ChIJM3pmh-oZUzcRxEj0i0X72NM Juice🍹& Spice🌶 23.003974 91.729881 0.0 NaN NaN 2P3H+HXJ, Sabroom, Tripura 799145, India
2252 ChIJ2dxRyT0ZUzcRVlumPmgbV3c Pushpa fast food 23.000829 91.727325 4.7 9.0 NaN Pushpa fast food chotokhil Rd, opposite of bag...
2258 ChIJi_nlqHcZUzcRKjNPd5Cctnk Upalabdhi Food Plaza 23.002733 91.729778 3.5 4.0 NaN 2P3H+3WR, Sabroom, Tripura 799145, India
2264 ChIJAWkn0lsZUzcRI1rLiUVf23o Sabroom New Bus Stand 23.009141 91.725767 4.5 2.0 NaN 2P5G+M82, Sabroom, Tripura 799145, India
6347 ChIJczRZgTMB-zkRgvpD1QO1Yt8 M/S. Prapty Caterer & NANDITA Biriyani House 24.959450 88.240887 2.0 3.0 NaN Aiho, West Bengal 732121, India
... ... ... ... ... ... ... ... ...
8967 ChIJZxA1UjjTUTcRh4Rn0s1dzog Corner Cafe 24.871413 92.359391 3.3 132.0 NaN V9C5+HQ6, Karimganj, Assam 788710, India
8968 ChIJS34qNcrTUTcRnUTJMI3uwL0 Hotel City View 24.869049 92.364834 3.8 5.0 NaN Main Road, opp. Congress Office, Karimganj, As...
8969 ChIJw27U4zvTUTcRx5sfB5fjcPU Mamoni Hotel 24.869008 92.356553 3.2 34.0 NaN Circuit House Rd, Karimganj, Assam 788710, India
8970 ChIJ7WS_-EPTUTcRmnPXC2fm4W4 Ahar Hotel And Aheli Restaurant 24.867414 92.366639 3.6 250.0 NaN Shiv Bari Rd, Near Shib Mandir, Karimganj, Ass...
8971 ChIJ2VCwvvzTUTcRyJM86bkiGzU Kasturi Passport Centres 24.867124 92.369013 5.0 1.0 NaN Station Rd, Karimganj, Assam 788710, India

62 rows × 8 columns

In [ ]:
restaurant_df = restaurant_df[restaurant_df['address'].str.contains('Bangladesh')==True]
restaurant_df.reset_index(drop=True, inplace=True)

Now the dataframe restaurant_df contains only Bangladeshi restaurants.

In [ ]:
def missing_value_describe(data):
    # check missing values in the data
    total = data.isna().sum().sort_values(ascending=False)
    missing_value_pct_stats = (data.isnull().sum() / len(data)*100)
    missing_value_col_count = sum(missing_value_pct_stats > 0)

    # missing_value_stats = missing_value_pct_stats.sort_values(ascending=False)[:missing_value_col_count]
    missing_data = pd.concat([total, missing_value_pct_stats], axis=1, keys=['Total', 'Percent'])

    print("Number of rows with at least 1 missing values:", data.isna().any(axis = 1).sum())
    print("Number of columns with missing values:", missing_value_col_count)

    if missing_value_col_count != 0:
        # print out column names with missing value percentage
        print("\nMissing percentage (desceding):")
        display(missing_data[:missing_value_col_count])

        # plot missing values
        missing = data.isnull().sum()
        missing = missing[missing > 0]
        missing.sort_values(inplace=True)
        missing.plot.bar()
    else:
        print("No missing data!!!")

# pass a dataframe to the function
missing_value_describe(restaurant_df)
Number of rows with at least 1 missing values: 9675
Number of columns with missing values: 3

Missing percentage (desceding):
Total Percent
affluence 9672 90.426328
number_of_reviews 2528 23.635004
rating 1 0.009349

Converting the Affluence Level 1.0, 2.0, 3.0... to $, $$, $$$...

In [ ]:
restaurant_df['affluence'] = restaurant_df['affluence'].replace([1.0, 2.0, 3.0, 4.0],['$', '$$', '$$$', '$$$$'])
restaurant_df[restaurant_df['affluence'].notna()==True]
Out[ ]:
place_id name latitude longitude rating number_of_reviews affluence address
12 ChIJJwuMBKoLADoRcGUOid7tMUg ক্যাফে আড্ডা মঠবাড়িয়া 22.286668 89.958363 3.9 165.0 $$ 7XP5+M89, Mathbaria, Bangladesh
30 ChIJR6ppFA5RVTcR7vhuJ_FPWME Touhid Tea Store 22.746173 90.103694 4.1 37.0 $ Swarupkathi Bridge, Swarupkathi, Bangladesh
81 ChIJkdicn2gBADoRZgU_lUomDDw Hotel Rose Garden 22.578679 89.968584 3.8 81.0 $ Post Office Rd, Pirojpur Pourashava, Bangladesh
87 ChIJEyplIFEAADoRw_6MExFtfr8 Hotel Apyayon 22.579484 89.969643 3.5 31.0 $ HXH9+QVR, Pirojpur Pourashava, Bangladesh
92 ChIJIRrrteQJVDcRi_9mQMEBerA Nawab Chinese Restaurant and Party Center 23.865913 91.206187 4.4 50.0 $ Shahid Amir Hossen Road (1st floor, আখাউড়া, B...
... ... ... ... ... ... ... ... ...
10669 ChIJ-daAjkXu-zkRKA09tfrQTSI Party Point Thai and Chinese Restaurant 24.386058 88.608380 3.9 196.0 $$ Ground floor,1st and 2nd Floor, B.G.B Gate Rif...
10681 ChIJKx-b407u-zkR7VQuYVQ6ysk Razia Chinese and Thai Restaurant 24.383917 88.607963 3.7 49.0 $$ R685, Rajshahi 6203, Bangladesh
10685 ChIJbUqeQkXu-zkRPx152Vjuq3w Muskan Hotel And Restaurant 24.388027 88.606072 3.8 99.0 $ Bisik Match Factory Moor, Sapura, Boalia, Rajs...
10686 ChIJJ0af2v_u-zkRhBVjVemblw0 Mona Hotel & Restaurant 24.375834 88.593381 3.7 14.0 $ 9HGV+89J, Rajshahi, Bangladesh
10694 ChIJB-JYPK3v-zkRzte9zqVK9vY Bindu Hotel And Restaurant 24.374020 88.603169 3.8 689.0 $ Station Rd, Rajshahi 6000, Bangladesh

1024 rows × 8 columns

Saving the final dataframe into CSV

In [ ]:
final_df = restaurant_df[['name',	'latitude',	'longitude',	'rating',	'number_of_reviews',	'affluence']]
display(final_df)
final_df.to_csv("bangladesh_restaurants.csv", index=False, encoding='utf-8')
name latitude longitude rating number_of_reviews affluence
0 Jamal Store, Joykul Bazaar 22.604275 90.094718 0.0 NaN NaN
1 Salma Varaitis Store 22.619158 90.105594 5.0 1.0 NaN
2 হাজী বিরিয়ানি হাউজ 22.289046 89.958509 5.0 1.0 NaN
3 নিউ মুসলিম সুইটস এণ্ড বেকারি 22.288710 89.958482 5.0 4.0 NaN
4 মেসার্স সততা হোটেল এন্ড রেস্টুরেন্ট 22.286784 89.958116 0.0 NaN NaN
... ... ... ... ... ... ...
10691 Green castle 24.374087 88.600196 4.0 5.0 NaN
10692 Matir Manus 24.374515 88.604166 0.0 NaN NaN
10693 NR Home Kitchen 24.373602 88.600796 5.0 1.0 NaN
10694 Bindu Hotel And Restaurant 24.374020 88.603169 3.8 689.0 $
10695 ডালাস হোটেল অ্যান্ড রেস্টুরেন্ট 24.374205 88.603853 0.0 NaN NaN

10696 rows × 6 columns

Data Analysis


In [3]:
bd_restaurant = pd.read_csv("/content/bangladesh_restaurants.csv", encoding='utf-8')
display(bd_restaurant)
name latitude longitude rating number_of_reviews affluence
0 Jamal Store, Joykul Bazaar 22.604275 90.094718 0.0 NaN NaN
1 Salma Varaitis Store 22.619158 90.105594 5.0 1.0 NaN
2 হাজী বিরিয়ানি হাউজ 22.289046 89.958509 5.0 1.0 NaN
3 নিউ মুসলিম সুইটস এণ্ড বেকারি 22.288710 89.958482 5.0 4.0 NaN
4 মেসার্স সততা হোটেল এন্ড রেস্টুরেন্ট 22.286784 89.958116 0.0 NaN NaN
... ... ... ... ... ... ...
10691 Green castle 24.374087 88.600196 4.0 5.0 NaN
10692 Matir Manus 24.374515 88.604166 0.0 NaN NaN
10693 NR Home Kitchen 24.373602 88.600796 5.0 1.0 NaN
10694 Bindu Hotel And Restaurant 24.374020 88.603169 3.8 689.0 $
10695 ডালাস হোটেল অ্যান্ড রেস্টুরেন্ট 24.374205 88.603853 0.0 NaN NaN

10696 rows × 6 columns

In [66]:
bd_restaurant.describe()
Out[66]:
latitude longitude rating number_of_reviews
count 10696.000000 10696.000000 10695.000000 8168.000000
mean 23.811142 90.290969 3.122422 112.568560
std 1.048579 1.033504 1.865074 550.342005
min 20.856284 88.128098 0.000000 1.000000
25% 23.034571 89.516242 1.000000 2.000000
50% 23.765771 90.364833 4.000000 6.000000
75% 24.502531 90.983450 4.400000 37.000000
max 26.494126 92.438711 5.000000 17655.000000

Looks like some of the names are in Bangla. Lets separate the restaurants' that have their names in Bangla.

In [4]:
reg = re.compile(r'[a-zA-Z]')

bd_restaurant["name_type"] = bd_restaurant["name"].apply(lambda x: "English" if reg.match(x) else "Bangla")

en_bd_restaurant = bd_restaurant[bd_restaurant['name_type'] == "English"]
non_en_bd_restaurant = bd_restaurant[bd_restaurant['name_type'] == "Bangla"]    

printmd("### Restaurants With English Name")
display(en_bd_restaurant)
printmd("### Restaurants With Bangla Name")
display(non_en_bd_restaurant)

Restaurants With English Name

name latitude longitude rating number_of_reviews affluence name_type
0 Jamal Store, Joykul Bazaar 22.604275 90.094718 0.0 NaN NaN English
1 Salma Varaitis Store 22.619158 90.105594 5.0 1.0 NaN English
5 Sharif food fair 22.289866 89.959118 5.0 2.0 NaN English
11 Food Club The Caterer's 22.287912 89.958609 4.5 8.0 NaN English
13 New Muslim Sweets & bekare 22.288718 89.958737 5.0 3.0 NaN English
... ... ... ... ... ... ... ...
10690 lead generation 24.374515 88.604166 0.0 NaN NaN English
10691 Green castle 24.374087 88.600196 4.0 5.0 NaN English
10692 Matir Manus 24.374515 88.604166 0.0 NaN NaN English
10693 NR Home Kitchen 24.373602 88.600796 5.0 1.0 NaN English
10694 Bindu Hotel And Restaurant 24.374020 88.603169 3.8 689.0 $ English

7527 rows × 7 columns

Restaurants With Bangla Name

name latitude longitude rating number_of_reviews affluence name_type
2 হাজী বিরিয়ানি হাউজ 22.289046 89.958509 5.0 1.0 NaN Bangla
3 নিউ মুসলিম সুইটস এণ্ড বেকারি 22.288710 89.958482 5.0 4.0 NaN Bangla
4 মেসার্স সততা হোটেল এন্ড রেস্টুরেন্ট 22.286784 89.958116 0.0 NaN NaN Bangla
6 সোহেল রানা কাটোল ফাম 22.293492 89.962316 5.0 1.0 NaN Bangla
7 মেসার্স সৌখিন ষ্টীল এন্ড পার্টেক্স ফার্নিচার 22.285984 89.957330 0.0 NaN NaN Bangla
... ... ... ... ... ... ... ...
10661 ১৩ পার্বন 24.376295 88.622616 4.7 7.0 NaN Bangla
10667 আহার্য - Aharjo 24.385239 88.626540 5.0 1.0 NaN Bangla
10676 নিউ তৃপ্তি হোটেল এন্ড রেস্টুরেন্ট 24.407021 88.615485 4.0 101.0 NaN Bangla
10688 সালমান বাংলা খাবার হোটেল / বাবুল বাংলা খাবার হ... 24.405724 88.592444 5.0 2.0 NaN Bangla
10695 ডালাস হোটেল অ্যান্ড রেস্টুরেন্ট 24.374205 88.603853 0.0 NaN NaN Bangla

3169 rows × 7 columns

In [5]:
data = en_bd_restaurant.name.value_counts().to_dict()

wc = WordCloud(width=800, height=400,background_color="white", max_font_size=300).generate_from_frequencies(data)
plt.figure(figsize=(14,10))
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
result = wc.to_file("English_word_cloud.png")
printmd("### These are the Most Frequently Used Restaurant Names in English")

These are the Most Frequently Used Restaurant Names in English

In [7]:
from bnlp.corpus import stopwords, punctuations
regex = r"[\u0980-\u09FF]+" 
data = non_en_bd_restaurant.name.value_counts().to_dict()

wc = WordCloud(width=800, height=400,background_color="white", max_font_size=300, font_path="/content/Siyamrupali.ttf", regexp=regex).generate_from_frequencies(data)
plt.figure(figsize=(14,10))
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
result = wc.to_file("Bangla_word_cloud.png")
printmd("### These are the Most Frequently Used Restaurant Names in Bangla")

These are the Most Frequently Used Restaurant Names in Bangla

Heat Map

In [11]:
import geopandas
import folium
from folium.plugins import MarkerCluster, HeatMap

geometry = geopandas.points_from_xy(bd_restaurant.longitude, bd_restaurant.latitude)
geo_df = geopandas.GeoDataFrame(bd_restaurant[['longitude', 'latitude']], geometry=geometry)

geo_df.head()

bd_coordinate = [23.6850, 90.3563]

site_map = folium.Map(location=bd_coordinate, tiles='Cartodb dark_matter', zoom_start=8)
heat_data = [[point.xy[1][0], point.xy[0][0]] for point in geo_df.geometry ]

# heat_data
HeatMap(heat_data).add_to(site_map)

site_map
Out[11]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Restaurants With Price Range

Plotting only those restaurants that have price levels

  • $ means Cheap
  • $$ means Moderate
  • $$$ means Expensive
  • $$$ means Very Expensive
In [69]:
bd_coordinate = [23.6850, 90.3563]
site_map = folium.Map(location=bd_coordinate, zoom_start=7)

data = bd_restaurant[bd_restaurant['affluence'].notna()==True]

for i in range(0, len(data)):
    folium.Marker(
        location=[data.iloc[i]['latitude'], data.iloc[i]['longitude']],
        popup=data.iloc[i]['name'],
        tooltip=str(data.iloc[i]['name'])+','+str(data.iloc[i]['affluence'])
    ).add_to(site_map)
site_map
Out[69]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Restaurants According to Reviews

In [42]:
bd_coordinate = [23.6850, 90.3563]
circle_map = folium.Map(location=bd_coordinate, zoom_start=8, prefer_canvas=True,)
data = bd_restaurant[bd_restaurant['affluence'].notna()==True]

data['number_of_reviews'].fillna(0, inplace=True)
data['number_of_reviews'] = data['number_of_reviews'].astype(int, errors='ignore')

occurences = folium.map.FeatureGroup()

n_mean = data['number_of_reviews'].mean()

for lat, lng, number, name in zip(data['latitude'],
                                        data['longitude'],
                                        data['number_of_reviews'], data['name']):
  occurences.add_child(
      folium.vector_layers.CircleMarker(
          [lat, lng],
          radius=number/(n_mean/3), # radius for number of occurrences
          color='yellow',
          fill=True,
          fill_color='blue',
          fill_opacity=0.4,
          tooltip=str(number)+','+str(name),
          # get more from tooltip https://github.com/python-visualization/folium/issues/1010#issuecomment-435968337
      )
  )

circle_map.add_child(occurences)
/usr/local/lib/python3.7/dist-packages/pandas/core/series.py:4536: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  downcast=downcast,
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
Out[42]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Expensive Restaurants with Ratings

In [62]:
data = bd_restaurant[bd_restaurant['affluence'].notna()==True]
data_expensive = data[data['affluence'] == "$$$"]


bd_coordinate = [23.6850, 90.3563]
expensive_map = folium.Map(location=bd_coordinate, zoom_start=10, prefer_canvas=True,)

for i in range(0, len(data_expensive)):
    folium.Marker(
        location=[data_expensive.iloc[i]['latitude'], data_expensive.iloc[i]['longitude']],
        # popup=data_expensive.iloc[i]['name'],
        tooltip=str(data_expensive.iloc[i]['name'])+','+str(data_expensive.iloc[i]['rating'])
    ).add_to(expensive_map)

expensive_map
Out[62]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Very Expensive Restaurants with Ratings

In [63]:
data = bd_restaurant[bd_restaurant['affluence'].notna()==True]
data_very_expensive = data[data['affluence'] == "$$$$"]


bd_coordinate = [23.6850, 90.3563]
very_expensive_map = folium.Map(location=bd_coordinate, zoom_start=10, prefer_canvas=True,)

for i in range(0, len(data_very_expensive)):
    folium.Marker(
        location=[data_very_expensive.iloc[i]['latitude'], data_very_expensive.iloc[i]['longitude']],
        # popup=data_expensive.iloc[i]['name'],
        tooltip=str(data_very_expensive.iloc[i]['name'])+','+str(data_very_expensive.iloc[i]['rating'])
    ).add_to(very_expensive_map)

very_expensive_map
Out[63]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Remarks


The dataset may contain some anomalies such as Tea Stores or Food Stores that are also registered under Restaurant keyword. More extensive cleaning can be done to handle such issues in the future.